What makes a word : Learning base units in Japanese for speechrecognitionLaura
نویسنده
چکیده
We describe an automatic process for learning word units in Japanese. Since the Japanese orthography has no spaces delimiting words, the rst step in building a Japanese speech recognition system is to deene the units that will be recognized. Our method applies a compound-nding algorithm, previously used to nd word sequences in English, to learning syllable sequences in Japanese. We report that we were able not only to extract meaningful units, eliminating the need for possibly inconsistent manual segmentation, but also to decrease perplexity using this automatic procedure, which relies on a statistical, not syntactic, measure of relevance. Our algorithm also uncovers the kinds of environments that help the recognizer predict phonological alternations, which are often hidden by morphologically-motivated tok-enization.
منابع مشابه
What makes a word: Learning base units in Japanese for speech recognition
We describe an automatic process for learning word units in Japanese. Since the Japanese orthography has no spaces delimiting words, the first step in building a Japanese speech recognition system is to define the units that will be recognized. Our method applies a compound-finding algorithm, previously used to find word sequences in English, to learning syllable sequences in Japanese. We repor...
متن کاملWord Segmentation Standard in Chinese, Japanese and Korean
Word segmentation is a process to divide a sentence into meaningful units called “word unit” [ISO/DIS 24614-1]. What is a word unit is judged by principles for its internal integrity and external use constraints. A word unit’s internal structure is bound by principles of lexical integrity, unpredictability and so on in order to represent one syntactically meaningful unit. Principles for externa...
متن کاملDevelopment of an online accent dictionary and a reading tutor to support teaching and learning of Japanese prosody
Through tight collaboration with teachers of Japanese, an online accent dictionary and a reading tutor are developed to support teaching and learning of Japanese prosody. In this development, techniques of natural language processing and spoken language processing are effectively applied. What makes teaching and learning of Japanese accent difficult is the fact that word accent of Japanese ofte...
متن کاملAn automatic method for learning a Japanese lexicon for recognition of spontaneous speech
When developing a speech recognition system, one must start by deciding what the units to be recognized should be. This is for the most part a straightforward choice in the case of word-based languages such as English, but becomes an issue even in handling languages with a complex compounding system like German; with an agglutinative language like Japanese, which provides no spaces in written t...
متن کاملDescription of personal knowledge management as a base for knowledge management.
Abstract Personal knowledge management has been developed in various fields such as knowledge management, personal information management, communication psychology and philosophy of science, management, communications, and so a lot of research to a few experimental or approved about (PKM) is a research that the original data is just a few pages. So research in this area seems to be very focuse...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997